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B.  INTRODUCTION 

Interval-censored  (IC)  data  are  encountered  in  three  areas  of  breast  cancer  research. 
The  most  common  application  is  in  clinical  relapse  follow-up  studies  in  which  the  study 
endpoint  is  disease-free  survival.  When  a  patient  relapses,  it  is  usually  known  that  the 
relapse  takes  place  between  two  follow-up  visits,  and  the  exact  time  to  relapse  is  unknown. 
In  statistics,  we  say  relapse  time  is  interval  censored.  Interval  censoring  is  also  encountered 
in  breast  cancer  registry  studies  in  which  information  on  family  history  of  cancer  is  updated 
periodically.  The  Strang  Breast  Surveillance  Program  for  women  at  increased  risk  for  breast 
cancer,  for  instance,  has  enlisted  over  800  women  with  complete  pedigree  information  which 
is  verified  and  updated  continuously.  Family  history  data  such  as  age  at  diagnosis  of  a 
specific  cancer,  or  a  benign  but  risk-conferring  condition,  are  obtained  from  each  registrant 
at  each  update.  Time  to  a  cancer  event,  and  definitely  time  to  first  detection  of  a  benign 
condition,  are  at  best  known  to  fall  in  the  time  interval  between  the  last  update  and  age 
at  diagnosis.  A  third  but  increasingly  important  area  of  application  of  interval  censoring 
is  in  breast  cancer  chemoprevention  experiments  or  prevention  trials,  which  involve  the 
observation  of  one  or  more  surrogate  endpoint  biomarkers  (SEB)  over  time.  The  scientific 
question  of  interest  here  is  the  estimation  of  time  for  the  SEB  to  reach  a  target  value,  and 
time  from  cessation  of  intake  of  a  chemopreventive  agent  to  the  loss  of  its  protective  effect. 
Unfortunately,  the  exact  values  of  both  these  time  variables  are  known  only  to  lie  in  between 
two  successive  assay  inspection  times. 

Let  X  denote  a  time-to-event  variable  with  distribution  F(x)  =  Pr(X  <  x),  or  equiv¬ 
alently,  survival  function  S(x)  =  1  —  F(x).  In  interval  censoring,  X  is  not  observed  and 
is  known  only  to  lie  in  an  observable  interval  (L,R).  In  our  previous  DOD  funded  grant, 
we  have  made  fundamental  contributions  to  both  the  theory  of  the  generalized  maximum 
likelihood  (GML)  estimation  of  S,  and  the  computation  in  connection  with  the  inference  of 
GML  estimator  (GMLE)  S  of  5.  These  contributions  are  restricted  to  the  case  of  univariate 
interval-censored  data. 

Multivariate  interval  censoring  involves  d  >  2  correlated  X  variables,  each  of  which 
is  subject  to  interval  censoring.  The  main  statistical  concern  here  is  the  GML  estimation 
of  the  joint  survival  function  S(xi,...,Xd)  =  Pr(Xi  >  xi,...,Xd  >  Xd),  and  the  correla¬ 
tions  among  the  variables.  Our  interest  in  multivariate  IC  data  is  driven  by  needs  arising 
from  two  related  areas  of  breast  cancer  research  at  Strang.  First,  our  investigators  in  the 
Strang  Cancer  Genetics  Program  want  to  study  various  patterns  of  familial  aggregation  of 
breast,  ovarian  and  other  forms  of  cancer  using  family  history  data  from  the  Strang  Breast 
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Surveillance  Program.  Studies  of  familial  early  onset  of  breast  cancer,  breast-ovarian  and 
breast-prostate  associations  will  lead  to  multivariate  IC  data  of  high  dimensions;  therefore, 
a  proper  statistical  procedure  together  with  a  feasible  software  to  deal  with  such  data  are 
very  much  needed.  Second,  we  are  conducting  a  one-year  chemoprevention  trial  of  indole-3- 
carbinol  (ISC)  for  breast  cancer  prevention.  In  this  prevention  trial  we  are  monitoring  the 
levels  of  two  SEB’s,  a  urinary  estrogen  metabolite  ratio  and  a  blood  counterpart,  both  of 
which  are  subject  to  interval  censoring.  An  earlier  dose-ranging  study  of  ISC  conducted  by 
Wong  et  al  [1]  has  been  published. 

Statistical  analysis  of  multivariate  IC  data  has  never  been  attempted.  In  the  multivari¬ 
ate  situation,  modeling  of  the  intercorrelated  time-to-event  variables  and  their  dependency 
structure  will  require  a  great  deal  of  innovative  thinking;  moreover,  GML  computation  in 
realistic  sample  sizes  can  be  prohibitively  difficult. 

The  overall  aim  of  this  research  proposal  is  to  develop  statistical  inference  for  multi¬ 
variate  interval-censored  data  that  are  encountered  in  breast  cancer  chemoprevention  trials 
employing  multiple  surrogate  endpoint  biomarkers,  and  in  breast  cancer  registry  follow-up 
studies  of  familial  aggregation  of  breast  and  other  forms  of  cancer.  Asymptotic  general¬ 
ized  maximum  likelihood  theory  has  been  investigated  and  computer  software  package  for 
maximum  likelihood  inference  and  Kaplan-Meier  type  survival  plots  has  been  implemented. 


C.  BODY 

Consider  nonparametric  estimation  of  the  joint  survival  function  S{xi, ...,  Xd)  = 

Pr(A'i  >  xi,...,Xd  >  Xd)  oi  d  >  2  intercorrelated  time-to-event  variables  Xi,  ...,  Xd,  each 
of  which  is  subject  to  interval  censoring.  For  ease  of  presentation  and  without  any  loss  of 
generality,  we  shall  restrict  our  discussion  to  the  bivariate  case  X  =  (Xi, ^2). 

Let  {Ui,Vi)  denote  two  consecutive  follow-up  times  corresponding  to  Xi,  and  {Li,Ri) 
denote  the  observable  interval-censored  (IC)  data  for  Xi  defined  as 


iLi,Ri) 


'{0,Ui)  if  Xi<Ui, 

<  {Ui,Vi)  if  Ui<Xi<  Vi, 

(Fi,-hoo)  ifXi>Vi, 


(1) 


for  i  =  1,  2.  Under  this  two-dimensional  interval  censorship  model,  data  are  always  interval 
censored,  i.e.,  Li  <  Ri  with  probability  one.  If  we  allow  the  possibility  of  having  exact 
observations  in  the  data,  so  that 

Li  =  Ri  =  Xi,  (2) 
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then  (1)  and  (2)  together  define  a  two-dimensional  mixed  interval  censorship  model. 

Let  Bi  denote  any  one  of  [0, 17i],  (Ui,Vi]  and  (Vi,-t-oo).  Therefore,  a  bivariate  IC  data 
point  is  a  rectangular  region  in  V?  taking  one  of  the  nine  forms  in  B  =  {Bk  'x  Bi  :  k,l  = 
1,2,3}.  Given  a  sample  of  size  n,  the  observations  {Ln,  Rn,  Li2,  Ri2)  can  be  represented 
by  rectangle  subsets  li  e  B,  for  i  =  1,  ...,  n.  Define  a  maximal  intersection  (MI)  A  of  the 
observable  rectangles  Ji,  to  be  a  nonempty  finite  intersection  of  the  Jj’s  such  that 

AC\Ii  =  ^  ox  A,  for  each  i.  Let  Ai,  ...,  A^,  denote  the  distinct  maximal  intersections  with 
respect  to  Ji, 

The  generalized  likelihood  function  of  S  is  given  by  x  •  •  •  x  ns{In),  where 

lj,s{-)  is  the  probability  measure  induced  by  S.  Wong  and  Yu  [2]  show  that  the  GMLE  S, 
which  maximizes  An,  must  assign  all  the  probability  masses  Si,  ...,  Sm  to  Ai,  ...,  A^.  In 
general,  S  has  to  be  obtained  iteratively.  Since  S  is  also  a  self-consistent  estimate  (SCE), 
we  can  implement  the  SCE  algorithm  by  solving  for  si,  ...,  Sm  in 


s 


i 


1  s  Sj 


j  =  1,  ...,  m,  where  5^-  =  l[Aj  C  /j],  ![•]  denoting  the  indicator  function,  and  obtain  an 
SCE  of  S{x) 

S(x)  =  Y.  ‘i- 

•AjC(a:i,+oo)x--x(xd,+oo) 

With  starting  values  —  1/m  for  all  j,  S{x)  is  the  GMLE  at  convergence. 

In  the  first  and  second  years  of  our  research,  we  have  established  consistency  and 
asymptotic  normality  of  the  GMLE  S{x)  under  both  discrete  and  continuous  assumptions. 
Additionally,  we  have  derived  asymptotic  properties  of  the  weighted  Kaplan-Meier  test 
statistics  given  by 

D  =  f  W{x){Sa{x)  ~  SB{x))dx, 

Jx>0 

where  W(-)  is  a  given  weight  function,  and  A  and  B  refer  to  two  comparison  conditions. 

A  key  feature  of  multivariate  IC  data  and  a  parameter  of  substantive  importance  is  the 
correlation  coefficient  p  between  pair  of  the  X  variables,  say  Xi  and  X2.  The  GMLE  of 
p{Xi,X2)  is 
Kxi,X2) 


_ /  /  XiX2dF{xi,X2)  -ff  XidF(xi,X2)  f  f  X2dF{xi,X2) _ 

{[//a;?dE(xi,X2)  -  (//xidF(a:i,S2))2][/  J xldF{xi,X2){J  J X2dF{xi,X2)f]y^^ 
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In  a  follow-up  study  involving  interval  censoring,  it  is  often  the  case  that  not  all  events 
will  take  place  by  the  end  of  the  study.  In  this  situation,  p  will  not  provided  a  consistent 
estimate  of  p.  Let  r  denote  the  largest  follow-up  time.  A  more  appropriate  correlation 
coefficient  to  consider  is 


pr{Xi,X2) 


Cou(Ai,A2|Ai,A2  <r) 
y/Var{Xi\Xi  <  T)V-ar{X2\X2  <  r) ' 


F,  the  GMLE  of  Fo,  is  a  discrete  cdf  with  discontinuity  points  at  the  upper-right  vertexes 
of  the  maximum  intersections.  Without  loss  of  generality,  let  oi  <  •  ■  •  <  dm  be  the  set 
of  partition  points  of  the  real  line  such  that  the  set  {(oi,Oj)  :  i,j  G  {0, 1,  +  1}} 

contains  all  the  discontinuity  points  of  F,  where  oq  =  -oo  and  Om+i  =  oo.  Let  Sij  denote 
the  GMLE  of  the  bivariate  probability  weight  assigned  to  {ai,aj)  by  F.  The  GMLE  of  pr 
is  given  by 


»  _  EqqEi2  —  EiqEq2 

“  VTEoQEn  -  (Fio)2][FooF22  -  (Fos)^!’ 

where  E12  =  X)aj,a3<c50  -^oo  =  So<,oj<oo  ■^10  =  Saj,aj<oo 

Eo2  Eai,Oj<oo  Fii  So<,a^<oo  and  E22  £oi,Oj<c50 

From  the  consistency  results  of  Wong  and  Yu  [2],  and  Yu  [3]  we  can  show  that  p^  is 
consistent  under  the  assumption  that  the  union  of  the  support  sets  of  censoring  variables 
is  dense.  Moreover,  if  the  range  of  the  censoring  vector  is  finite,  pr  can  be  shown  to  be 
asymptotically  normally  distributed.  The  asymptotic  variance  of  pr  can  be  estimated  by 


where  B  =  s  =  {sij  :  {i,j)  ^  (m,  m)}',  and  X  is  the  information  matrix,  that  is 

r=_^ 

ds'ds 

We  are  preparing  a  manuscript  to  report  these  findings. 

When  the  finite  distribution  assumption  regarding  the  censoring  vector  is  not  met,  we 
shall  have  to  resort  to  the  proposed  bootstrap  method  (Task  5)  to  investigate  the  asymptotic 
behavior  of  pr-  We  shall  devote  our  effort  to  this  research  topic  in  the  fourth  year  of  no-cost 
extension  of  our  DOD  grant. 
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D.  KEY  RESEARCH  ACCOMPLISHEMENTS 

•  We  have  expanded  the  scope  of  Task  5  to  include  a  theoretical  consideration  for  the 
GMLE  p  of  the  correlation  coefficient  p  between  a  pair  of  correlated  time-to-event 
variables. 

•  We  have  established  consistency  and  asymptotic  normality  of  p  under  a  finite  distribu¬ 
tion  assumption. 


E.  REPORTABLE  OUTCOMES 

•  2  published  articles  in  journals  :  [2],  [4]. 

•  Computer  programs  for  comprehensive  GML  inferences  installed  in 
http:  / /www.math.binghamton.edu/qyu/index/html. 

F.  CONCLUSIONS 

In  the  past  three  year  of  our  DOD  grant,  we  have  successfully  accomplished  our  re¬ 
search  objectives  stated  in  Tasks  1-4  and  part  of  Task  5.  Under  the  multivariate  interval 
censorship  model,  we  have  established  consistency,  asymptotic  normality  and  asymptotic 
efficiency  of  the  GMLE  under  various  assumptions.  We  have  encountered  and  conquered  a 
methodological  problem  arising  from  the  unexpected  outcome  that  S  may  not  be  unique  in 
multivariate  interval  censoring.  Also,  we  have  derived  asymptotic  results  for  the  GMLE  of 
the  correlation  coefficient  between  a  pair  of  correlated  time-to-event  variables  under  finite 
distribution  assumption.  Finally,  we  have  implemented  computer  programs  for  carrying  out 
the  asymptotic  GML  procedure. 

The  results  which  we  have  established  will  be  useful  to  breast  cancer  researchers  pursu¬ 
ing  chemoprevention  intervention  trials  involving  multiple  surrogate  endpoints  biomarkers, 
and  genetic  epidemiologists  conducting  studies  on  familial  aggregation  of  breast  cancer  and 
related  cancers. 
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